1 Executive Summary

The main aim of this research report was to investigate the where and who of Kiva fundees, for this would provide insight to both Kiva itself and potential Kiva lenders. We found that most of the Kiva loans were directed into the Agriculture, Food and Retail sectors, and most commonly originated from countries with a relatively low GDP per capita. Furthermore, there was no general geographic trend found for the distribution of agriculture-related loans around the world. Finally, due to the high levels of inequality found in Kiva’s most common sectors and countries, our report shows that the vast majority of Kiva fundees are Women.


2 Full Report

2.1 Initial Data Analysis (IDA)

Kiva is a service aimed at providing small loans to the world’s unbanked population.

The data set used to generate our research questions comes from the Kiva Crowdfunding “Data Science For Good” open data initiative. This initiative was created so members of the public could help Kiva better understand the levels of poverty in areas where they had active loans (“Data Science for Good: Kiva Crowdfunding” 2018). The data has a CC0: Public Domain licence meaning that we are free to use and distribute the data as we wish (“Creative Commons — CC0 1. 0 Universal” 2023).

As the data has been uploaded by Kiva themselves for the purpose of finding real insights about their data through an open competition on Kaggle, the data is original (a primary source), unaggregated or edited, and thus can be said to be very reliable. The data has been edited by a community member called mfab which may raise some concerns, however as Kiva is the ‘Owner’ it is assumed that they have approved this editor and that the data remains reliable. Somewhat of a limitation is that the data only ranges between 2014 and 2017, and hence any possible deviations from long-established trends occurring as a result of the COVID-19 pandemic cannot be extrapolated. This would have made for an interesting report.

Some wrangling of the data was required to generate the plots utilising the Kiva datasets. Most of the wrangling involved grouping and summarising to generate new dataframes which isolated variables for comparison. Also required for the geographical analysis undertaken in IQ 3 was to de-normalise the two supplementary datasets provided by Kiva — kiva_loan_ids and kiva_loan_regions — to find the 3-digit ISO code that went along with every loan. This process has been undertaken in the data setup section below.

Potential stakeholders for this report are government organisations and charitable services such as Kiva as they would want to be informed about where a majority of money is being requested and how it is being used.

2.2 Data setup

library(tidyverse)
library(tmap)
library(countrycode)
library(janitor)
library(plotly)
# Kiva loans dataset
kiva_loans <- read_csv("data/kiva_loans.csv")

# Two supplementary Kiva datasets surrounding loan themes and id's
kiva_loan_ids <- read_csv("data/loan_theme_ids.csv")
kiva_loan_regions <- read_csv("data/loan_themes_by_region.csv")

# An external kaggle dataset with a list of countries and corresponding regions
countries <- read_csv("data/countries.csv")

The Kiva loans datasets have been set up with two primary keys: id and Loan theme ID. The dataset has been normalised such that if we merge kiva_loan_ids with kiva_loan_regions we will be able to extract the 3 digit ISO code related to each loan, which, in combination with the tmap package, will allow us to conduct a geographical analysis of Kiva loans.

# tmap in-built World dataset
data("World")

kiva_loan_ids_subset <- kiva_loan_ids %>%
  select(c("id", "Loan Theme ID")) %>%
  rename(loan_theme_id = `Loan Theme ID`)

kiva_loan_regions_subset <- kiva_loan_regions %>%
  select(c("Loan Theme ID", "country", "ISO")) %>%
  rename(iso = ISO,
         loan_theme_id = `Loan Theme ID`)
kiva_loan_regions_subset <- kiva_loan_regions_subset[!duplicated(kiva_loan_regions_subset), ] # remove duplicates

# Perform merge
kiva_loans_themeid <- inner_join(kiva_loans, kiva_loan_ids_subset, by = "id")
kiva_loans_PKs <- inner_join(kiva_loans_themeid, kiva_loan_regions_subset, by = c("country", "loan_theme_id"))

Note that in the code above we performed an inner join between all three datasets. Due to the presence of NA’s and missing data, this process will inevitably cause some data loss. In our case, we lost approximately 30000 rows of data (\(~4\%\) of the original dataset). We deem that the upsides to working with a complete dataset (with no missing loan theme id’s or ISO’s) is enough to justify this small loss in data.

<<<<<<< HEAD
# Checking for any null values in the two columns of importance
if (is.null(kiva_loans_PKs$iso) || is.null(kiva_loans_PKs$loan_theme_id)) {
  print("Null values present in `iso` or `loan_theme_id` columns.")
} else {
  print("No null values present in `iso` or `loan_theme_id` columns.")
}
=======
## [1] "No null values present in `iso` or `loan_theme_id` columns."
>>>>>>> e8d1a6888061990105f86938d4703d691b68c917

2.3 IQ 1: Is there a Relationship between a country’s GDP per capita and its total loan sum?

To answer this question, an interactive scatter plot was produced using the below code. Hovering above a data point will bring up the country’s name, its GDP per capita and its total loan sum. Both axes are logarithmic with a base of 10 in order to spread out the data so relationships can be drawn.

# Grouping by country and adding the loan amount for the kiva dataset
kiva_loan_sum <- kiva_loans %>%
  group_by(country) %>%
  summarise(signif(sum(loan_amount)*10^(-6),5))

# Selecting the useful data from World in tmap and renaming the country column
world_gdp <- World %>%
  select(c("name", "gdp_cap_est")) %>%
  rename(country = name)

# Merging the world_gdp data frame and the kiva_sum data frame
kiva_gdp <- merge(kiva_loan_sum, world_gdp, all.x = F, all.y = F)

# Changing the column names for the interactive part of the plot
colnames(kiva_gdp) <- c("Country", "Loan Sum", "GDP per Capita")

# Generating the plot with the axes in log10
plot_kiva_gdp <- ggplot(kiva_gdp, aes(x = `GDP per Capita`, y = `Loan Sum`, country = `Country`)) +
  geom_point(colour  = "black") + 
  scale_x_continuous(trans = 'log10') +
  scale_y_continuous(trans = 'log10') +
  labs(x = "GDP per capita (USD)", y = "Loan Sum (Million USD)", title = "Total sum of Kiva loans against the GDP per capita for each country")
ggplotly(plot_kiva_gdp)
<<<<<<< HEAD
======= <<<<<<< HEAD
=======
>>>>>>> 8c01b5000269e1068764c88d07dd02ff60c04e44 >>>>>>> e8d1a6888061990105f86938d4703d691b68c917

The GDP data used in the interactive scatter plot above is pulled from the World dataset found within the tmap package. The plot has slight clustering towards the top left. This lead us to initially suspect that a relationship may exist between a lower GDP per capita and a higher amount of loaned money. When attempting to fit a regression line to the data, no obvious relationship could be drawn. What we can discern from the data however, is that it is right skewed, indicting that countries with lower GDP’s per capita make up most of total loans. Many people in these countries don’t have access to financial services or even banking, creating a need for charitable services like Kiva. It is Kiva’s goal to provide such individuals with access to loans (“Learn More about Kiva’s Mission” 2023). Our data therefore demonstrates the free market in action; those who live in countries with low GDP’s need Kiva, and Kiva needs them.

2.4 IQ 2: What is the distribution of funding between the Kiva designated sectors?

kiva_loans %>%
  group_by(sector) %>%
  summarise(total_funding = signif(sum(funded_amount)*10^(-6),5)) %>%
  ggplot(aes(x=fct_reorder(sector, desc(total_funding)), y = total_funding)) + geom_bar(fill = "cornflowerblue", color = "black", stat = "identity", position = "dodge", width = 0.8) +   # Represents the data as column chart
  labs(x = "Sectors", y = "Total Funding (Million USD)", title = 'Total Kiva funding per sector') +
  theme_classic() + 
  geom_text(aes(label = total_funding), vjust = -0.5, size = 2) +
  theme(axis.text.x = element_text(angle = 60, vjust = 0.5, hjust=0.4))

Reviewing the above bar plot, we can see that agriculture, food and retail are by far the most funded Kiva sectors. For context, these three sectors received twice the amount of funding than the remaining 12. Interestingly, if we review the ‘Use’ column of the data frame it can also be observed that many loans categorised for retail are in fact loans to purchase food items such as Salt, Rice, Flour etc.

library(gt)
kiva_loans %>%
  filter(sector == 'Retail') %>%
  select(use) %>%
  slice(1:10) %>% 
  gt() %>%
  tab_header(
    title = "Use data when the loan is categorised as Retail",) %>%
 cols_label(use = "Use")
<<<<<<< HEAD
Use data when the loan is categorised as Retail
Use
to buy stock of rice, sugar and flour .
to buy packs of salts, biscuits and beverages.
to buy packs of salt, biscuits, and beverages.
to buy hair oils to sell.
to buy different kinds of knives to sell
to buy rice, sugar and flour in bulk.
To buy women's shoes to sell
to stock his store.
to buy additional items like eggs, charcoal, rice, Milo, shampoo, groceries, etc. to sell
to purchase body lotions, hair oil, jewelery, chemicals and hair conditioners for resale.

The conclusions drawn from IQ 1 inform us that Kiva loans are most typically requested in developing countries with low GDP’s, in these countries access to food is not a given and the creation of a constant food supply may be able to lift some out of poverty. In his 2015 report, Robert Townsend states that for the worlds poorest, growth in agriculture is two to four times more effective in raising living standards than growth in the next closest sector (Townsend 2015). This fact can create a win-win scenario for all stakeholders. So long as the loan is used wisely, the loan-takers can create more value than they initially borrowed, and funders can see their investment amount to more than its dollar value. As such, it is not surprising that agriculture loans make up 27% of all Kiva loans.

2.6 IQ 4: What is the average loan amount per gender in each region?

A side by side bar graph was created to visualise the mean loan amounts per gender in each region. It’s worth mentioning that 4,221 sets of data with missing gender information were excluded from the dataset before creating the graph.

## Filtering out data which does not have either Male or Females in borrower_genders column. 
genders_clean <- kiva_loans %>% 
  filter(!is.na(borrower_genders) & borrower_genders %in% c("male", "female")) 

## Finding the mean funded amount for genders dependent on country 
summary_aggregated <- genders_clean %>% 
  group_by(country, borrower_genders) %>% 
  summarize(mean_funded_amount = mean(funded_amount)) 

## Making columns and separating data 
summary_aggregated <- summary_aggregated %>% 
  mutate(
    males = ifelse(borrower_genders == "male", mean_funded_amount, 0), 
    females = ifelse(borrower_genders == "female", mean_funded_amount, 0)
  ) 

## Synthesizing data 
synthesized_data <- summary_aggregated %>% 
  group_by(country) %>% 
  summarize(male = sum(males), female = sum(females)) 

## Merging data with regions

countries_clean <- countries %>%
  clean_names() %>%
  subset(select = c("country", "region"))

## Renaming a column 
synthesized_data <- rename(synthesized_data, c("Country" = "country")) 
countries_clean <- rename(countries_clean, c("Country" = "country")) 

## Combining male and female data with countries data 
countries_clean <- inner_join(countries_clean, synthesized_data, by = "Country") 

## Regions Only 
regions_data <- countries_clean %>% 
  group_by(region) %>% 
  summarize(male = sum(male), female = sum(female)) 

## Making it look nice 
regions_data <- regions_data %>% 
  gather(key = "gender", value = "value", male, female) 

## Plotting Side by Side Graph 
ggplotly(
  ggplot(data = regions_data, aes(x = region, y = value, fill = gender)) + 
  geom_bar(colour = "black", stat = "identity", position = "dodge") + 
  scale_fill_manual(values = c("male" = "cornflowerblue", "female" = "pink")) + 
  labs(x = "\nRegions", y = "Mean Loan Amount\n") + 
  ggtitle("Mean Loan Amount Per Gender in Regions") + 
  theme(
    plot.title = element_text(hjust = 0.5), 
    axis.title.x = element_text(colour = "black"), 
    axis.title.y = element_text(colour = "black"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )
)
<<<<<<< HEAD
======= <<<<<<< HEAD
=======
>>>>>>> 8c01b5000269e1068764c88d07dd02ff60c04e44 >>>>>>> e8d1a6888061990105f86938d4703d691b68c917

Based on the dataset, it was found that for every 1 male borrower, there were 3.16 female borrowers. This is reflected in the graph, as females had a higher mean loan amount per gender in 5 out of 9 regions. The region of Latin America and the Caribbean stood out as having the largest differential gap in mean loan amounts between males and females among all the other regions.

This can be attributed to the high levels of gender inequality in this region, as concluded in a study conducted by the Inter-American Development Bank (IDB) titled “An Unequal Olympiad: Gender Equity in Latin American and Caribbean Companies” (Basco et al. 2021). The study found that women hold only 15% of management positions, 35% of women in the workforce have access to advanced technologies, and 6 out of 10 companies do not provide any type of maternity leave beyond what is determined by law. Highlighting the need for further research and action to address gender inequality in this region, and the potential impact it has on loan amounts.

References

———.
Basco, Ana Inés, Ángeles Barral Verna, Andrea Monje Silva, Magdalena Barafani, Natalia Sant Anna Torres, and Stephanie Oueda Cruz. 2021. “Una Olimpíada Desigual: La Equidad de Género En Las Empresas Latinoamericanas y Del Caribe.” Edited by Andrea Verónica Benitez and María Florencia Merino. Inter-American Development Bank. https://doi.org/10.18235/0003427.
“Creative Commons — CC0 1. 0 Universal.” 2023. https://creativecommons.org/publicdomain/zero/1.0/.
“Data Science for Good: Kiva Crowdfunding.” 2018. https://www.kaggle.com/datasets/kiva/data-science-for-good-kiva-crowdfunding.
“Learn More about Kiva’s Mission.” 2023. Kiva. https://www.kiva.org/about.
Townsend, Robert. 2015. “Ending Poverty and Hunger by 2030 : An Agenda for the Global Food System.” World Bank Group. https://documents.worldbank.org/en/publication/documents-reports/documentdetail/700061468334490682/ending-poverty-and-hunger-by-2030-an-agenda-for-the-global-food-system.